Overview

Dataset statistics

Number of variables29
Number of observations30000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.7 MiB
Average record size in memory270.1 B

Variable types

NUM15
CAT13
BOOL1

Reproduction

Analysis started2020-03-08 05:05:00.237763
Analysis finished2020-03-08 05:07:38.094722
Versionpandas-profiling v2.5.3
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh Correlation
BILL_AMT1 is highly correlated with BILL_AMT2High Correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh Correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh Correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh Correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh Correlation
IS_FEMALE is highly correlated with SEXHigh Correlation
SEX is highly correlated with IS_FEMALEHigh Correlation
IS_FEMALE is highly correlated with SEXHigh Correlation
SEX is highly correlated with IS_FEMALEHigh Correlation
PAY_AMT2 is highly skewed (γ1 = 30.45381745) Skewed
BILL_AMT1 has 2008 (6.7%) zeros Zeros
BILL_AMT2 has 2506 (8.4%) zeros Zeros
BILL_AMT3 has 2870 (9.6%) zeros Zeros
BILL_AMT4 has 3195 (10.7%) zeros Zeros
BILL_AMT5 has 3506 (11.7%) zeros Zeros
BILL_AMT6 has 4020 (13.4%) zeros Zeros
PAY_AMT1 has 5249 (17.5%) zeros Zeros
PAY_AMT2 has 5396 (18.0%) zeros Zeros
PAY_AMT3 has 5968 (19.9%) zeros Zeros
PAY_AMT4 has 6408 (21.4%) zeros Zeros
PAY_AMT5 has 6703 (22.3%) zeros Zeros
PAY_AMT6 has 7173 (23.9%) zeros Zeros

Variables

ID
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count30000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15000.5
Minimum1
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum1
5-th percentile1500.95
Q17500.75
median15000.5
Q322500.25
95-th percentile28500.05
Maximum30000
Range29999
Interquartile range (IQR)14999.5

Descriptive statistics

Standard deviation8660.398374
Coefficient of variation (CV)0.5773406469
Kurtosis-1.2
Mean15000.5
Median Absolute Deviation (MAD)7500
Skewness0
Sum450015000
Variance75002500
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.e+00 3.e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
1322 1 < 0.1%
 
15629 1 < 0.1%
 
9486 1 < 0.1%
 
11535 1 < 0.1%
 
21792 1 < 0.1%
 
23841 1 < 0.1%
 
17698 1 < 0.1%
 
19747 1 < 0.1%
 
29988 1 < 0.1%
 
Other values (29990) 29990 > 99.9%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
ValueCountFrequency (%) 
30000 1 < 0.1%
 
29999 1 < 0.1%
 
29998 1 < 0.1%
 
29997 1 < 0.1%
 
29996 1 < 0.1%
 

LIMIT_BAL
Real number (ℝ≥0)

Distinct count81
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167484.3227
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129747.6616
Coefficient of variation (CV)0.7746854124
Kurtosis0.5362628964
Mean167484.3227
Median Absolute Deviation (MAD)104957.0008
Skewness0.9928669605
Sum5024529680
Variance1.683445568e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10000. 13000. 18000. 25000. 35000. ... 505000. 525000. 645000. 755000. 1000000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
50000 3365 11.2%
 
20000 1976 6.6%
 
30000 1610 5.4%
 
80000 1567 5.2%
 
200000 1528 5.1%
 
150000 1110 3.7%
 
100000 1048 3.5%
 
180000 995 3.3%
 
360000 881 2.9%
 
60000 825 2.8%
 
Other values (71) 15095 50.3%
 
ValueCountFrequency (%) 
10000 493 1.6%
 
16000 2 < 0.1%
 
20000 1976 6.6%
 
30000 1610 5.4%
 
40000 230 0.8%
 
ValueCountFrequency (%) 
1000000 1 < 0.1%
 
800000 2 < 0.1%
 
780000 2 < 0.1%
 
760000 1 < 0.1%
 
750000 4 < 0.1%
 

SEX
Categorical

HIGH CORRELATION
HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
18112
1
11888
ValueCountFrequency (%) 
2 18112 60.4%
 
1 11888 39.6%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

EDUCATION
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
14030
1
10585
3
4917
0
 
468
ValueCountFrequency (%) 
2 14030 46.8%
 
1 10585 35.3%
 
3 4917 16.4%
 
0 468 1.6%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

MARRIAGE
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.6 KiB
2
15964
1
13659
3
 
323
0
 
54
ValueCountFrequency (%) 
2 15964 53.2%
 
1 13659 45.5%
 
3 323 1.1%
 
0 54 0.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

AGE
Real number (ℝ≥0)

Distinct count56
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.4855
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.217904068
Coefficient of variation (CV)0.2597653709
Kurtosis0.04430337824
Mean35.4855
Median Absolute Deviation (MAD)7.546117967
Skewness0.7322458688
Sum1064565
Variance84.96975541
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[21. 21.5 22.5 23.5 26.5 ... 58.5 61.5 66.5 70.5 79. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
29 1605 5.3%
 
27 1477 4.9%
 
28 1409 4.7%
 
30 1395 4.7%
 
26 1256 4.2%
 
31 1217 4.1%
 
25 1186 4.0%
 
34 1162 3.9%
 
32 1158 3.9%
 
33 1146 3.8%
 
Other values (46) 16989 56.6%
 
ValueCountFrequency (%) 
21 67 0.2%
 
22 560 1.9%
 
23 931 3.1%
 
24 1127 3.8%
 
25 1186 4.0%
 
ValueCountFrequency (%) 
79 1 < 0.1%
 
75 3 < 0.1%
 
74 1 < 0.1%
 
73 4 < 0.1%
 
72 3 < 0.1%
 

PAY_0
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
0
14737
-1
5686
1
3688
-2
 
2759
2
 
2667
Other values (6)
 
463
ValueCountFrequency (%) 
0 14737 49.1%
 
-1 5686 19.0%
 
1 3688 12.3%
 
-2 2759 9.2%
 
2 2667 8.9%
 
3 322 1.1%
 
4 76 0.3%
 
5 26 0.1%
 
8 19 0.1%
 
6 11 < 0.1%
 

Length

Max length2
Mean length1.2815
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 90.0%
 
Dash_Punctuation 1 10.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

PAY_2
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
0
15730
-1
6050
2
3927
-2
3782
3
 
326
Other values (6)
 
185
ValueCountFrequency (%) 
0 15730 52.4%
 
-1 6050 20.2%
 
2 3927 13.1%
 
-2 3782 12.6%
 
3 326 1.1%
 
4 99 0.3%
 
1 28 0.1%
 
5 25 0.1%
 
7 20 0.1%
 
6 12 < 0.1%
 

Length

Max length2
Mean length1.327733333
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 90.0%
 
Dash_Punctuation 1 10.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

PAY_3
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
0
15764
-1
5938
-2
4085
2
3819
3
 
240
Other values (6)
 
154
ValueCountFrequency (%) 
0 15764 52.5%
 
-1 5938 19.8%
 
-2 4085 13.6%
 
2 3819 12.7%
 
3 240 0.8%
 
4 76 0.3%
 
7 27 0.1%
 
6 23 0.1%
 
5 21 0.1%
 
1 4 < 0.1%
 

Length

Max length2
Mean length1.3341
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 90.0%
 
Dash_Punctuation 1 10.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

PAY_4
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
0
16455
-1
5687
-2
4348
2
 
3159
3
 
180
Other values (6)
 
171
ValueCountFrequency (%) 
0 16455 54.9%
 
-1 5687 19.0%
 
-2 4348 14.5%
 
2 3159 10.5%
 
3 180 0.6%
 
4 69 0.2%
 
7 58 0.2%
 
5 35 0.1%
 
6 5 < 0.1%
 
8 2 < 0.1%
 

Length

Max length2
Mean length1.3345
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 90.0%
 
Dash_Punctuation 1 10.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

PAY_5
Categorical

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
0
16947
-1
5539
-2
4546
2
 
2626
3
 
178
Other values (5)
 
164
ValueCountFrequency (%) 
0 16947 56.5%
 
-1 5539 18.5%
 
-2 4546 15.2%
 
2 2626 8.8%
 
3 178 0.6%
 
4 84 0.3%
 
7 58 0.2%
 
5 17 0.1%
 
6 4 < 0.1%
 
8 1 < 0.1%
 

Length

Max length2
Mean length1.336166667
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 90.0%
 
Dash_Punctuation 1 10.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

PAY_6
Categorical

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
0
16286
-1
5740
-2
4895
2
 
2766
3
 
184
Other values (5)
 
129
ValueCountFrequency (%) 
0 16286 54.3%
 
-1 5740 19.1%
 
-2 4895 16.3%
 
2 2766 9.2%
 
3 184 0.6%
 
4 49 0.2%
 
7 46 0.2%
 
6 19 0.1%
 
5 13 < 0.1%
 
8 2 < 0.1%
 

Length

Max length2
Mean length1.3545
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 90.0%
 
Dash_Punctuation 1 10.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22723
Unique (%)75.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51223.3309
Minimum-165580
Maximum964511
Zeros2008
Zeros (%)6.7%
Memory size234.5 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13558.75
median22381.5
Q367091
95-th percentile201203.05
Maximum964511
Range1130091
Interquartile range (IQR)63532.25

Descriptive statistics

Standard deviation73635.86058
Coefficient of variation (CV)1.437545339
Kurtosis9.806289341
Mean51223.3309
Median Absolute Deviation (MAD)50502.00599
Skewness2.663861022
Sum1536699927
Variance5422239963
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-165580. -14847. -6352.5 -2223. -1032.5 ... 311489. 390634.5 509866. 641760. 964511. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2008 6.7%
 
390 244 0.8%
 
780 76 0.3%
 
326 72 0.2%
 
316 63 0.2%
 
2500 59 0.2%
 
396 49 0.2%
 
2400 39 0.1%
 
416 29 0.1%
 
500 25 0.1%
 
Other values (22713) 27336 91.1%
 
ValueCountFrequency (%) 
-165580 1 < 0.1%
 
-154973 1 < 0.1%
 
-15308 1 < 0.1%
 
-14386 1 < 0.1%
 
-11545 1 < 0.1%
 
ValueCountFrequency (%) 
964511 1 < 0.1%
 
746814 1 < 0.1%
 
653062 1 < 0.1%
 
630458 1 < 0.1%
 
626648 1 < 0.1%
 

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22346
Unique (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49179.07517
Minimum-69777
Maximum983931
Zeros2506
Zeros (%)8.4%
Memory size234.5 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q12984.75
median21200
Q364006.25
95-th percentile194792.2
Maximum983931
Range1053708
Interquartile range (IQR)61021.5

Descriptive statistics

Standard deviation71173.76878
Coefficient of variation (CV)1.447236829
Kurtosis10.30294592
Mean49179.07517
Median Absolute Deviation (MAD)48673.54453
Skewness2.705220853
Sum1475372255
Variance5065705363
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-6.97770e+04 -9.48450e+03 -2.97650e+03 -1.04700e+03 -4.22500e+02 ... 3.24529e+05 4.00555e+05 5.12588e+05 6.01868e+05 9.83931e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2506 8.4%
 
390 231 0.8%
 
326 75 0.2%
 
780 75 0.2%
 
316 72 0.2%
 
2500 51 0.2%
 
396 51 0.2%
 
2400 42 0.1%
 
-200 29 0.1%
 
416 28 0.1%
 
Other values (22336) 26840 89.5%
 
ValueCountFrequency (%) 
-69777 1 < 0.1%
 
-67526 1 < 0.1%
 
-33350 1 < 0.1%
 
-30000 1 < 0.1%
 
-26214 1 < 0.1%
 
ValueCountFrequency (%) 
983931 1 < 0.1%
 
743970 1 < 0.1%
 
671563 1 < 0.1%
 
646770 1 < 0.1%
 
624475 1 < 0.1%
 

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22026
Unique (%)73.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47013.1548
Minimum-157264
Maximum1664089
Zeros2870
Zeros (%)9.6%
Memory size234.5 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12666.25
median20088.5
Q360164.75
95-th percentile187821.05
Maximum1664089
Range1821353
Interquartile range (IQR)57498.5

Descriptive statistics

Standard deviation69349.38743
Coefficient of variation (CV)1.475106015
Kurtosis19.78325514
Mean47013.1548
Median Absolute Deviation (MAD)46873.96302
Skewness3.087830046
Sum1410394644
Variance4809337537
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.572640e+05 -1.680800e+04 -5.286500e+03 -2.802000e+03 -1.065000e+03 ... 3.080855e+05 3.956640e+05 4.993875e+05 5.881930e+05 1.664089e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2870 9.6%
 
390 275 0.9%
 
780 74 0.2%
 
326 63 0.2%
 
316 62 0.2%
 
396 48 0.2%
 
2500 40 0.1%
 
2400 39 0.1%
 
416 29 0.1%
 
200 27 0.1%
 
Other values (22016) 26473 88.2%
 
ValueCountFrequency (%) 
-157264 1 < 0.1%
 
-61506 1 < 0.1%
 
-46127 1 < 0.1%
 
-34041 1 < 0.1%
 
-25443 1 < 0.1%
 
ValueCountFrequency (%) 
1664089 1 < 0.1%
 
855086 1 < 0.1%
 
693131 1 < 0.1%
 
689643 1 < 0.1%
 
689627 1 < 0.1%
 

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count21548
Unique (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43262.94897
Minimum-170000
Maximum891586
Zeros3195
Zeros (%)10.7%
Memory size234.5 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12326.75
median19052
Q354506
95-th percentile174333.35
Maximum891586
Range1061586
Interquartile range (IQR)52179.25

Descriptive statistics

Standard deviation64332.85613
Coefficient of variation (CV)1.487019671
Kurtosis11.30932483
Mean43262.94897
Median Absolute Deviation (MAD)43639.00712
Skewness2.821965291
Sum1297888469
Variance4138716378
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-170000. -25896.5 -6121.5 -2976.5 -1570. ... 320757.5 390539. 489393. 570919.5 891586. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3195 10.7%
 
390 246 0.8%
 
780 101 0.3%
 
316 68 0.2%
 
326 62 0.2%
 
396 44 0.1%
 
150 39 0.1%
 
2400 39 0.1%
 
2500 34 0.1%
 
1000 33 0.1%
 
Other values (21538) 26139 87.1%
 
ValueCountFrequency (%) 
-170000 1 < 0.1%
 
-81334 1 < 0.1%
 
-65167 1 < 0.1%
 
-50616 1 < 0.1%
 
-46627 1 < 0.1%
 
ValueCountFrequency (%) 
891586 1 < 0.1%
 
706864 1 < 0.1%
 
628699 1 < 0.1%
 
616836 1 < 0.1%
 
572805 1 < 0.1%
 

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count21010
Unique (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40311.40097
Minimum-81334
Maximum927171
Zeros3506
Zeros (%)11.7%
Memory size234.5 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11763
median18104.5
Q350190.5
95-th percentile165794.3
Maximum927171
Range1008505
Interquartile range (IQR)48427.5

Descriptive statistics

Standard deviation60797.15577
Coefficient of variation (CV)1.508187617
Kurtosis12.30588129
Mean40311.40097
Median Absolute Deviation (MAD)41211.06439
Skewness2.876379867
Sum1209342029
Variance3696294150
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-81334. -10657.5 -5042. -1981.5 -1003. ... 265940. 311764. 370768. 520227. 927171. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3506 11.7%
 
390 235 0.8%
 
780 94 0.3%
 
316 79 0.3%
 
326 62 0.2%
 
150 58 0.2%
 
396 47 0.2%
 
2400 39 0.1%
 
2500 37 0.1%
 
416 36 0.1%
 
Other values (21000) 25807 86.0%
 
ValueCountFrequency (%) 
-81334 1 < 0.1%
 
-61372 1 < 0.1%
 
-53007 1 < 0.1%
 
-46627 1 < 0.1%
 
-37594 1 < 0.1%
 
ValueCountFrequency (%) 
927171 1 < 0.1%
 
823540 1 < 0.1%
 
587067 1 < 0.1%
 
551702 1 < 0.1%
 
547880 1 < 0.1%
 

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count20604
Unique (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38871.7604
Minimum-339603
Maximum961664
Zeros4020
Zeros (%)13.4%
Memory size234.5 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11256
median17071
Q349198.25
95-th percentile161912
Maximum961664
Range1301267
Interquartile range (IQR)47942.25

Descriptive statistics

Standard deviation59554.10754
Coefficient of variation (CV)1.53206613
Kurtosis12.27070529
Mean38871.7604
Median Absolute Deviation (MAD)40381.46803
Skewness2.846644576
Sum1166152812
Variance3546691724
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-339603. -54251.5 -24295. -6106. -3020. ... 311963.5 365432.5 439967.5 527638.5 961664. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4020 13.4%
 
390 207 0.7%
 
780 86 0.3%
 
150 78 0.3%
 
316 77 0.3%
 
326 56 0.2%
 
396 45 0.1%
 
416 36 0.1%
 
-18 33 0.1%
 
2400 32 0.1%
 
Other values (20594) 25330 84.4%
 
ValueCountFrequency (%) 
-339603 1 < 0.1%
 
-209051 1 < 0.1%
 
-150953 1 < 0.1%
 
-94625 1 < 0.1%
 
-73895 1 < 0.1%
 
ValueCountFrequency (%) 
961664 1 < 0.1%
 
699944 1 < 0.1%
 
568638 1 < 0.1%
 
527711 1 < 0.1%
 
527566 1 < 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS
Distinct count7943
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5663.5805
Minimum0
Maximum873552
Zeros5249
Zeros (%)17.5%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2100
Q35006
95-th percentile18428.2
Maximum873552
Range873552
Interquartile range (IQR)4006

Descriptive statistics

Standard deviation16563.28035
Coefficient of variation (CV)2.924524575
Kurtosis415.2547427
Mean5663.5805
Median Absolute Deviation (MAD)5922.429753
Skewness14.66836433
Sum169907415
Variance274342256.1
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 6.500000e+00 1.750000e+01 1.635000e+02 ... 1.000740e+05 1.017880e+05 1.647010e+05 3.034075e+05 8.735520e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5249 17.5%
 
2000 1363 4.5%
 
3000 891 3.0%
 
5000 698 2.3%
 
1500 507 1.7%
 
4000 426 1.4%
 
10000 401 1.3%
 
1000 365 1.2%
 
2500 298 1.0%
 
6000 294 1.0%
 
Other values (7933) 19508 65.0%
 
ValueCountFrequency (%) 
0 5249 17.5%
 
1 9 < 0.1%
 
2 14 < 0.1%
 
3 15 0.1%
 
4 18 0.1%
 
ValueCountFrequency (%) 
873552 1 < 0.1%
 
505000 1 < 0.1%
 
493358 1 < 0.1%
 
423903 1 < 0.1%
 
405016 1 < 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count7899
Unique (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5921.1635
Minimum0
Maximum1684259
Zeros5396
Zeros (%)18.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1833
median2009
Q35000
95-th percentile19004.35
Maximum1684259
Range1684259
Interquartile range (IQR)4167

Descriptive statistics

Standard deviation23040.8704
Coefficient of variation (CV)3.891274139
Kurtosis1641.631911
Mean5921.1635
Median Absolute Deviation (MAD)6478.832166
Skewness30.45381745
Sum177634905
Variance530881708.9
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 5.500000e+00 1.650000e+01 3.050000e+01 ... 1.000805e+05 1.500855e+05 2.066760e+05 4.082775e+05 1.684259e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5396 18.0%
 
2000 1290 4.3%
 
3000 857 2.9%
 
5000 717 2.4%
 
1000 594 2.0%
 
1500 521 1.7%
 
4000 410 1.4%
 
10000 318 1.1%
 
6000 283 0.9%
 
2500 251 0.8%
 
Other values (7889) 19363 64.5%
 
ValueCountFrequency (%) 
0 5396 18.0%
 
1 15 0.1%
 
2 20 0.1%
 
3 18 0.1%
 
4 11 < 0.1%
 
ValueCountFrequency (%) 
1684259 1 < 0.1%
 
1227082 1 < 0.1%
 
1215471 1 < 0.1%
 
1024516 1 < 0.1%
 
580464 1 < 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS
Distinct count7518
Unique (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5225.6815
Minimum0
Maximum896040
Zeros5968
Zeros (%)19.9%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1800
Q34505
95-th percentile17589.4
Maximum896040
Range896040
Interquartile range (IQR)4115

Descriptive statistics

Standard deviation17606.96147
Coefficient of variation (CV)3.36931393
Kurtosis564.3112295
Mean5225.6815
Median Absolute Deviation (MAD)5866.072007
Skewness17.21663544
Sum156770445
Variance310005092.2
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 1.250000e+01 3.450000e+01 1.495000e+02 ... 1.000895e+05 1.642195e+05 2.376245e+05 4.092800e+05 8.960400e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5968 19.9%
 
2000 1285 4.3%
 
1000 1103 3.7%
 
3000 870 2.9%
 
5000 721 2.4%
 
1500 490 1.6%
 
4000 381 1.3%
 
10000 312 1.0%
 
1200 243 0.8%
 
6000 241 0.8%
 
Other values (7508) 18386 61.3%
 
ValueCountFrequency (%) 
0 5968 19.9%
 
1 13 < 0.1%
 
2 19 0.1%
 
3 14 < 0.1%
 
4 15 0.1%
 
ValueCountFrequency (%) 
896040 1 < 0.1%
 
889043 1 < 0.1%
 
508229 1 < 0.1%
 
417588 1 < 0.1%
 
400972 1 < 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS
Distinct count6937
Unique (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4826.076867
Minimum0
Maximum621000
Zeros6408
Zeros (%)21.4%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1296
median1500
Q34013.25
95-th percentile16014.95
Maximum621000
Range621000
Interquartile range (IQR)3717.25

Descriptive statistics

Standard deviation15666.15974
Coefficient of variation (CV)3.246147995
Kurtosis277.3337677
Mean4826.076867
Median Absolute Deviation (MAD)5532.726692
Skewness12.90498482
Sum144782306
Variance245428561.1
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 6.50000e+00 1.85000e+01 9.95000e+01 ... 1.00052e+05 1.24744e+05 2.03538e+05 3.31385e+05 6.21000e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6408 21.4%
 
1000 1394 4.6%
 
2000 1214 4.0%
 
3000 887 3.0%
 
5000 810 2.7%
 
1500 441 1.5%
 
4000 402 1.3%
 
10000 341 1.1%
 
2500 259 0.9%
 
500 258 0.9%
 
Other values (6927) 17586 58.6%
 
ValueCountFrequency (%) 
0 6408 21.4%
 
1 22 0.1%
 
2 22 0.1%
 
3 13 < 0.1%
 
4 20 0.1%
 
ValueCountFrequency (%) 
621000 1 < 0.1%
 
528897 1 < 0.1%
 
497000 1 < 0.1%
 
432130 1 < 0.1%
 
400046 1 < 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS
Distinct count6897
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4799.387633
Minimum0
Maximum426529
Zeros6703
Zeros (%)22.3%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1252.5
median1500
Q34031.5
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3779

Descriptive statistics

Standard deviation15278.30568
Coefficient of variation (CV)3.183386475
Kurtosis180.0639402
Mean4799.387633
Median Absolute Deviation (MAD)5482.146365
Skewness11.12741705
Sum143981629
Variance233426624.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 4.500000e+00 2.350000e+01 9.950000e+01 ... 9.991100e+04 1.000720e+05 1.100710e+05 2.153995e+05 4.265290e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6703 22.3%
 
1000 1340 4.5%
 
2000 1323 4.4%
 
3000 947 3.2%
 
5000 814 2.7%
 
1500 426 1.4%
 
4000 401 1.3%
 
10000 343 1.1%
 
500 250 0.8%
 
6000 247 0.8%
 
Other values (6887) 17206 57.4%
 
ValueCountFrequency (%) 
0 6703 22.3%
 
1 21 0.1%
 
2 13 < 0.1%
 
3 13 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
426529 1 < 0.1%
 
417990 1 < 0.1%
 
388071 1 < 0.1%
 
379267 1 < 0.1%
 
332000 1 < 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS
Distinct count6939
Unique (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5215.502567
Minimum0
Maximum528666
Zeros7173
Zeros (%)23.9%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1117.75
median1500
Q34000
95-th percentile17343.8
Maximum528666
Range528666
Interquartile range (IQR)3882.25

Descriptive statistics

Standard deviation17777.46578
Coefficient of variation (CV)3.408581541
Kurtosis167.1614296
Mean5215.502567
Median Absolute Deviation (MAD)6199.318675
Skewness10.64072733
Sum156465077
Variance316038289.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 4.500000e+00 1.850000e+01 9.950000e+01 ... 1.000195e+05 1.223750e+05 2.013000e+05 2.889910e+05 5.286660e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 7173 23.9%
 
1000 1299 4.3%
 
2000 1295 4.3%
 
3000 914 3.0%
 
5000 808 2.7%
 
1500 439 1.5%
 
4000 411 1.4%
 
10000 356 1.2%
 
500 247 0.8%
 
6000 220 0.7%
 
Other values (6929) 16838 56.1%
 
ValueCountFrequency (%) 
0 7173 23.9%
 
1 20 0.1%
 
2 9 < 0.1%
 
3 14 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
528666 1 < 0.1%
 
527143 1 < 0.1%
 
443001 1 < 0.1%
 
422000 1 < 0.1%
 
403500 1 < 0.1%
 

DEFAULT
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
0
23364
1
6636
ValueCountFrequency (%) 
0 23364 77.9%
 
1 6636 22.1%
 

CREDIT_GROUP
Categorical

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
(9010.0, 175000.0]
17626
(175000.0, 340000.0]
8786
(340000.0, 505000.0]
 
3382
(505000.0, 670000.0]
 
170
(670000.0, 835000.0]
 
35
ValueCountFrequency (%) 
(9010.0, 175000.0] 17626 58.8%
 
(175000.0, 340000.0] 8786 29.3%
 
(340000.0, 505000.0] 3382 11.3%
 
(505000.0, 670000.0] 170 0.6%
 
(670000.0, 835000.0] 35 0.1%
 
(835000.0, 1000000.0] 1 < 0.1%
 

Length

Max length21
Mean length18.82496667
Min length18
ValueCountFrequency (%) 
Decimal_Number 9 64.3%
 
Other_Punctuation 2 14.3%
 
Close_Punctuation 1 7.1%
 
Space_Separator 1 7.1%
 
Open_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

IS_FEMALE
Categorical

HIGH CORRELATION
HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
1
18112
0
11888
ValueCountFrequency (%) 
1 18112 60.4%
 
0 11888 39.6%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

AGE_GROUP
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
(31.0, 37.0]
6728
(20.999, 27.0]
6604
(43.0, 79.0]
5986
(27.0, 31.0]
5626
(37.0, 43.0]
5056
ValueCountFrequency (%) 
(31.0, 37.0] 6728 22.4%
 
(20.999, 27.0] 6604 22.0%
 
(43.0, 79.0] 5986 20.0%
 
(27.0, 31.0] 5626 18.8%
 
(37.0, 43.0] 5056 16.9%
 

Length

Max length14
Mean length12.44026667
Min length12
ValueCountFrequency (%) 
Decimal_Number 7 58.3%
 
Other_Punctuation 2 16.7%
 
Close_Punctuation 1 8.3%
 
Space_Separator 1 8.3%
 
Open_Punctuation 1 8.3%
 
ValueCountFrequency (%) 
Common 12 100.0%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 

AGE_GROUP2
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
(20.942, 32.6]
13388
(32.6, 44.2]
11326
(44.2, 55.8]
4442
(55.8, 67.4]
 
799
(67.4, 79.0]
 
45
ValueCountFrequency (%) 
(20.942, 32.6] 13388 44.6%
 
(32.6, 44.2] 11326 37.8%
 
(44.2, 55.8] 4442 14.8%
 
(55.8, 67.4] 799 2.7%
 
(67.4, 79.0] 45 0.1%
 

Length

Max length14
Mean length12.89253333
Min length12
ValueCountFrequency (%) 
Decimal_Number 9 64.3%
 
Other_Punctuation 2 14.3%
 
Close_Punctuation 1 7.1%
 
Space_Separator 1 7.1%
 
Open_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DEFAULTCREDIT_GROUPIS_FEMALEAGE_GROUPAGE_GROUP2
01200002212422-1-1-2-239133102689000068900001(9010.0, 175000.0]1(20.999, 27.0](20.942, 32.6]
1212000022226-1200022682172526823272345532610100010001000020001(9010.0, 175000.0]1(20.999, 27.0](20.942, 32.6]
2390000222340000002923914027135591433114948155491518150010001000100050000(9010.0, 175000.0]1(31.0, 37.0](32.6, 44.2]
3450000221370000004699048233492912831428959295472000201912001100106910000(9010.0, 175000.0]1(31.0, 37.0](32.6, 44.2]
455000012157-10-100086175670358352094019146191312000366811000090006896790(9010.0, 175000.0]0(43.0, 79.0](55.8, 67.4]
56500001123700000064400570695760819394196192002425001815657100010008000(9010.0, 175000.0]0(31.0, 37.0](32.6, 44.2]
67500000112290000003679654120234450075426534830034739445500040000380002023913750137700(340000.0, 505000.0]0(27.0, 31.0](20.942, 32.6]
78100000222230-1-100-111876380601221-1595673806010581168715420(9010.0, 175000.0]1(20.999, 27.0](20.942, 32.6]
891400002312800200011285140961210812211117933719332904321000100010000(9010.0, 175000.0]1(27.0, 31.0](20.942, 32.6]
9102000013235-2-2-2-2-1-10000130071391200013007112200(9010.0, 175000.0]0(31.0, 37.0](32.6, 44.2]

Last rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DEFAULTCREDIT_GROUPIS_FEMALEAGE_GROUPAGE_GROUP2
29990299911400001214100000013832513714213911013826249675461216000700042281505200020000(9010.0, 175000.0]0(37.0, 43.0](32.6, 44.2]
2999129992210000121343222222500250025002500250025000000001(175000.0, 340000.0]0(31.0, 37.0](32.6, 44.2]
29992299931000013143000-2-2-288021040000002000000000(9010.0, 175000.0]0(37.0, 43.0](32.6, 44.2]
2999329994100000112380-1-100030421427102996706266947355004200011178440003000200020000(9010.0, 175000.0]0(37.0, 43.0](32.6, 44.2]
299942999580000122342222227255777708793847751982607811587000350007000040001(9010.0, 175000.0]0(31.0, 37.0](32.6, 44.2]
29995299962200001313900000018894819281520836588004312371598085002000050033047500010000(175000.0, 340000.0]0(37.0, 43.0](32.6, 44.2]
299962999715000013243-1-1-1-100168318283502897951900183735268998129000(9010.0, 175000.0]0(37.0, 43.0](32.6, 44.2]
29997299983000012237432-10035653356275820878205821935700220004200200031001(9010.0, 175000.0]0(31.0, 37.0](32.6, 44.2]
299982999980000131411-1000-1-16457837976304527741185548944859003409117819265296418041(9010.0, 175000.0]0(37.0, 43.0](32.6, 44.2]
299993000050000121460000004792948905497643653532428153132078180014301000100010001(9010.0, 175000.0]0(43.0, 79.0](44.2, 55.8]